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Abstract — Classification may refer to categorization, the 
process in which ideas and objects are recognized, 
differentiated, and understood. There are many types of 
classification, researchers face a problem to choose a 
suitable method that give a good classification 
performance to solve their classification problems. In this 
paper, we present the basic classification techniques. 
Several major kinds of classification method including 
neural netw’ork, decision tree, Bayesian networks, support 
vector machine and k-nearest neighbor classifier. The 
goal of this survey is to provide a comprehensive review 
of the above different classification techniques. 
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I. INTRODUCTION 

Classification methods are the way of classifying data 
into predefined classes. Classification method uses a set 
of features or parameters to characterize each object, 
where these features should be relevant to the task at hand 
[1], Classification, which maps a data item into one of 
several, predefined categories. These algorithms normally 
output “classifiers” has ability to classify new data in the 
future, for example, in the form of decision trees or rules. 
An ideal application in intrusion detection will be 
together sufficient “normal” and “abnormal” audit data 
for a user or a program. Here audit data refers to (pre- 
processed) records, each with a number of features 
(fields). Then a classification algorithm has been applied 
to train a classifier that will determine (future) audit data 
as belonging to the normal class or the abnormal class [2]. 
Many decision-making tasks are instances of 
classification problem or can be easily formulated into a 
classification problem, e.g., prediction and forecasting 
tasks, diagnosis tasks, and pattern recognition [15]. The 
research on linear classification has been a very active 
topic [16]. With the increasing of Internet scale network 
traffic classification is more and more important in 
network security, traffic scheduling and traffic accounting 
etc. [17,18]. 

Classification will be used when an object needs to be 
classified into a predefined class or group based on 
attributes of that object. There are many real world 


applications that can be categorized as classification 
problems such as weather forecast, credit risk evaluation, 
medical diagnosis, bankruptcy prediction, speech 
recognition, handwritten character recognition [19] and 
Survival analysis [20]. 

In recent studies the performance of different 
classification techniques have been based mainly on 
experimental approaches [9,10, 11]. Empirical 

comparisons among different classification methods 
suggest that no single method is best for all learning 
classification tasks [12,13]. In other words, each method 
is best for some, but not for all tasks. 

Classification systems play an important role in business 
decision-making tasks by classifying the available 
information based on some criteria. [4]. The objective of 
this paper is to reviews the well-known classification 
methods neural networks, decision trees, k-Nearest 
Neighbor, Naive Bayes, and Support Vector Machines. 
The rest of this paper is organize as follow: Our next 
section presents neural network. Section 3 describes 
decision tree. Naive Bayesian Network is discusses in 
section four. Section five gives details about support 
vector machine. Where k-nearest neighbor classifier is 
presents in section six. Finally, the last section concludes 
this work. 

II. NEURAL NETWORK CLASSIFICATION 
METHODS. 

Many types of Neural Networks can be used for 
classification but most popular NN is Back propagation 
NN and RBF NN. Artificial neural networks were 
initially developed according to the elementary principle 
of the operation of the (human) neural system [22]. Since 
then, a very large variety of networks have been 
constructed. All are composed of units (neurons), and 
connections between them, which together determine the 
behavior of the network. 

2.1- Backpropagation Neural Network: 

It is shown that from the literature review a BPNN 
having single layer of neurons could classify a set of 
points perfectly if they were linearly separable. BPNN 
having three layers of weights can generated arbitrary 
decision regions which may be non-convex and disjoint. 
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BPNN is based on processing elements, which compute a 
nonlinear function of the scalar product of the input 
vector and a weight vector [5]. 

One of the most popular NN algorithms is back 
propagation algorithm [23]. Claimed that BP algorithm 
could be broken down to four main steps. After choosing 
the weights of the network randomly, the back 
propagation algorithm is used to compute the necessary 
corrections. The algorithm can be decomposed in the 
following four steps: 

i) Feed-forward computation 

ii) ii) Back propagation to the output layer 

iii) iii) Back propagation to the hidden layer 

iv) iv) Weight updates 

Here are some situations where a BP NN might be a 

useful: 

• A large amount of input/output data is available, 
but you're not sure how to relate it to the output. 

• The problem appears to have overwhelming 
complexity, but there is clearly a solution. 

• It is easy to create a number of examples of the 
correct behavior. 

• The solution to the problem may change over 
time, within the bounds of the given input and 
output parameters (i.e., today 2+2=4, but in the 
future we may find that 2+2=3. 8). 

• Outputs can be "fuzzy", or non-numeric. 

Linear classification is a useful tool in machine learning 
and data mining. In contrast to nonlinear classifiers such 
as kernel methods, which map data to a higher 
dimensional space, linear classifiers directly work on data 
in the original input space. While linear classifiers fail to 
handle some inseparable data, they may be sufficient for 
data in a rich dimensional space. For example, linear 
classifiers have shown to give competitive performances 
on document data with nonlinear classifiers. An important 
advantage of linear classification is that training and 
testing procedures are much more efficient. Therefore, 
linear classification can be very useful for some large- 
scale applications. Recently, the research on linear 
classification has been a very active topic. In this paper, 
we give a comprehensive survey on the recent advances 
[ 6 ]. 

III. DECISION TREE 

Decision tree is classification scheme which generates a 
tree and asset of rules representing the model of different 
classes, from a given dataset .As in [41], DT is a flow 
chart like tree structure, where each internal node denotes 
a test on an attribute ,each branch represents an outcome 
of the test and leaf node represent the classes or class 


distributions .the top most node in a tree is the root 
node. [25] 

Decision trees are usually unvaried since they use based 
on a single feature at each internal node. Most decision 
tree algorithms cannot perform well with problems that 
require diagonal partitioning. The division Of the instance 
space is orthogonal to the axis of one variable and parallel 
to all other axes. Therefore, the resulting regions after 
partitioning are all hyper rectangles. However, there are a 
few methods that construct multivariate trees. One 
example is as in [43], 

Decision trees can be significantly more complex 
representation for some concepts due to the replication 
problem. A solution is using an algorithm to implement 
complex features at nodes in order to avoid 
replication,. Markovitch and Rosenstein in [42] presented 
the FICUS construction algorithm, which receives the 
standard input of supervised learning as well as a feature 
representation specification, and uses them to produce a 
set of generated features. While FICUS is similar in some 
aspects to other feature construction algorithms, its main 
strength is its generality and flexibility. FICUS was 
designed to perform feature generation given any feature 
representation specification complying with its general 
purpose grammar. The most well-known algorithm in the 
literature for building decision trees is the C4.5 (Quinlan, 
1993) [44]. C4.5is an extension of Quinlan's earlier ID3 
algorithm. 

IV. NAIVE BAYES CLASSIFIER 

Bayesian networks can efficiently represent complex 
probability distributions, and have received much 
attention in recent years [14]. During the past decade 
Bayesian networks have gained popularity in AI as a 
means of representing and reasoning with uncertain 
knowledge. Examples of practical applications include 
decision support, safety and risk evaluation, control 
systems, and data mining [26]. In the software 
engineering field, Bayesian networks have been used by 
Fenton [46] for software quality prediction. Naive Bayes 
is one of the most effective and efficient classification 
algorithms [24]. In classification learning problems, a 
learner attempts to construct a classifier from a given set 
of training examples with class labels. Abstractly, the 
probability model for a classifier is a conditional model 
P(C\F, F n ) 

Over a dependent class variable C with a small number of 
outcomes or classes, conditional on several feature 
variables FI through Fn. the problem is that if the number 
of features n is large or when a feature can take on a large 
number of values, then basing such a model on 
probability tables is infeasible. We therefore reformulate 
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the model to make it tractable. Using bayes' theorem, we 
can write 

d r r"\ u u PiC')P[Fi,...,F n \C~) 

P(C|fl Fn) - UA F n ) 

In plain English the above equation can be written as 

, . prior xlikelihood 

posterior = : 

evidence 

In practice we are interested in the numerator of the 
fraction, since the denominator does not depend on C and 
the value of the features Fi are given, so the denominator 
is effectively constant. The numerator is equivalent to the 
joint probability model 

P(C\F 1 F n ) 

Which can be rewritten as follows, using repeated 
application of the definition of conditional probability: 
P(C 1 ,F 1 F n ) 

= P{C)P(F 1 F n | C) 

=P(C)(F 1 |C)P(F 2 FJC.FJ 

= PCOPCF, |C)P(F 2 \c, F 1 )P(F 3 F n \C,F v F 2 ) 

P(C)P (Fi I C)P(F 2 1 C, F 1 )P(F 3 1 C, F 1; F 2 )P(F 4 F n \C, F 1; F 2 , . 

P (C)P (F, | C)P (F 2 1 C, F,)P (F 3 |C,F 1 ,F 2 ) P (F n | C, F 1( F 2 , F 3 

Now the "Naive" conditional independence assumptions 
come into play: assume that each feature F t is 

conditionally independent of every other feature F ; for 
j ^ i. This means that 

P(F i |C,F J ) = P(F i |C) 

For i =4 j, and so the joint model can be expressed as 

PdC 1 ,F 1 F n ) = P(C)P(Fi|C)P(F 2 |C)P(F 3 IO ... 

= P(C)n? =1 P(F l \Q 

This means that under the above independent 
assumptions, the conditional distribution over the class 
variable C can be expressed like this: 

PiC 1 ,F 1 F„) = ip( C) nl P(Pt\C) 

Where Z is scaling factor dependent only on F u ... , F n , 
i.e., a constant if the values of the feature variables as 
known. 

Model of this form are much more manageable, since 
they factor into a so-called class prior P( C) and 
independent probability distributions P(FilC). If there are 
k classes and if a model for each P(FilC=c)can be 
expressed in terms of r parameters, then the 
corresponding naive bayses model has (k-1) + nrk 
parameters. In practice, often k=2 and r=l are common, 
and so the total number of parameters of the naive bayes 
model is 2n +1, where n is the number of binary features 
used for classification and prediction. 


General Bayesian network classifiers are known as 
Bayesian networks, belief networks or causal probabilistic 
networks. The theoretical concepts of Bayesian networks 
were invented by Judea Pearl in the 1980s and are 
described in his pioneering book Probabilistic Reasoning 
in Intelligent Systems [27]. During the past decade 
Bayesian networks have gained popularity in AI as a 
means of representing and reasoning with uncertain 
knowledge. Examples of practical applications include 
decision support, safety and risk evaluation, control 
systems, and data mining [32]. The state-of-the-art 
research papers on Bayesian networks are published in 
the proceedings of the Annual Conference on Uncertainty 
in AI [33]. Theoretical principles of Bayesian networks 
are described in several books, for example [27-31]. 

V. SUPPORT VECTOR MACHINES 

Support Vector Machine (SVM) was first heard in 1992, 
introduced by Boser, Guyon and Vapnik in COFT-92. 
Support vector machines (SVMs) are a set of related 
supervised learning Methods used for classification and 
regression [8]. They belong to a family of generalized 
’ 3 ) linear classifiers. In another terms. Support Vector 
Machine (SVM) is a classification and regression 
... , f^U^ction tool that uses machine learning theory to 
maximize predictive accuracy while automatically 
avoiding over-fit to the data. SVMs were developed to 
solve the classification problem, but recently they have 
been extended to solve regression problems [14]. 

VI. K-NEAREST NEIGHBOR 

K-Nearest Neighbor is one of the most popular algorithms 
for text categorization [34]. Many researchers have found 
that the KNN algorithm achieves very good performance 
in their experiments on different data sets [35,7,36]. The 
idea behind k-Nearest Neighbor algorithm is quite 
straightforward. To classify a new document, the system 
finds the k nearest neighbors among the training 
documents, and uses the categories of the k nearest 
neighbors to weight the category candidates [34]. One of 
the drawbacks of KNN algorithm is its efficiency, as it 
needs to compare a test document with all samples in the 
training set. In addition, the performance of this algorithm 
greatly depends on two factors, that is, a suitable 
similarity function and an appropriate value for the 
parameter k. The KNN is the fundamental and simplest 
classification technique when there is little or no prior 
knowledge about the distribution of the data [37-40]. 

VII. CONCLUSIONS 

This paper gives a survey of classification methods 
focusing on neural network, decision tree, Bayesian 
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networks, support vector machine and k-nearest neighbor 

classifier. 
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